r 

MEASURING  THE  RELIABILITY  OF  EQUIPMENTS 

IN  OPERATING  ENVIRONMENTS 

David  S.  Stoller 

P-1672 

April  21,  1959 

Reproduced  by 

The  RAND  Corporation  •  Santa  Monica  •  California 


The  views  expressed  in  this  peper  ere  not  necesserily  those  of  the  Corporation 


P-1672 

4-21-59 

-1- 

«43UEINQ  THE  MELLABUJTT  Ql  IQUIfttMTS  IH  OPPATIMO  WmOMUtP 

A  f ew  jttn  ago,  -reliability*  mu  a  Jargon  word.  Today  wo  find  it  In 
tha  vocabulary  of  tha  informed  lay  public,  lam  nagaainea  and  newspapers 
uaa  tha  tam  frequently  in  artielee  on  missiles  and  other  equipments. 

As  operations  researchers,  wa  subscribe  to  the  scientific  principle  that 
knowledge  derives  ultimately  free  neasuranant.  The  purpose  of  this  paper  is 
to  discuss  sows  of  the  concepts  and  problaus  pertinent  to  the  neaeurauant  of 
reliability. 

The  definition  of  reliability  suat  be  well  understood  by  those  concerned 
with  measuring  it.  -Reliability*  encompasses  five  concepts:  a  probability; 
an  aggregate;  an  environment;  a  set  of  criteria;  a  tine  interval .  (See  Ref¬ 
erence^  Ifor  a  detailed  discussion.)  A  -barefoot*  definition  that  illustrates 
these  points  is:  "Reliability  is  the  chance  that  souething  does  what  it  is 
supposed  to  do,  whan  and  where  it  is  supposed  to  do  it." 

It  is  evident  froai  the  definition  above  that  there  are  infinitely  aany 
reliabilities  that  describe  the  behavior  of  a  particular  aggregate.  The 
so-called  -storage  life*  of  an  equipment  involves  the  probability  that  the 
equijaaant  passes  a  specified  test  after  a  specified  amount  of  time  in  a 
specified  environment,  for  different  testing  procedures  (visual  inspection, 
continuity  check,  static  operation,  etc.)  tha  value  of  -storage  Ilf  ah  will 
be  different. 

let  us  visualise  a  particular  aggregate  as  an  entity  which  is  -born* 
(accepted  into  an  operational  inventory),  -grows  up-  (installation  and 
debugging  at  an  operational  site),  -nature*"  (performs  useful  tasks,  prevent¬ 
ive  and  corrective  aalntenanee  occurs),  and  -dies*  (removed  from  operational 
inventory  for  overhaul,  or  undergoes  a  destructive  id  salon,  such  as  a  launching). 


P-1672 

4-21-59 

-2- 


Many  reliability  uuurmt  systau  concentrate  too  heavily  on  the 
"death"  of  the  Individual  aggregate.  This  ie  certainly  important  for  inven¬ 
tory  caleulat iona .  However,  a  major  portion  of  the  cost  of  an  operating 
inventory  of  equipment  in  being  over  a  period  of  tine  results  fron  mainten¬ 
ance  requirenente  generated  during  the  "life”  of  the  aggregate.  Therefore, 
it  ie  important  to  neaaure  the  reliability  of  equipnente  for  environuenta, 
criteria,  and  time  intervale  which  are  appropriate  to  aggregate  "life 
eituatione",  each  ae  checkouts,  e to rage,  etc.  (failure  model e  appropriate 
to  thie  concept  are  diaeuaaed  in  References  2  and  3.) 

Many  attempts  have  bean  made  to  neaaure  the  reliability  of  aggregatee 
on  the  baaia  of  data  oo 11 acted  for  other  purpoeae,  for  example,  data  on  the 
iaauea  of  aggregatee  from  a  atockage  point.  However,  ieauea  from  a  atockage 
point  are  caused  by  many  factors  other  than  malfunctions  of  operating  aggre¬ 
gates:  changes  in  operating  policies,  mandatory  replacements,  anticipated 
conception,  "hoarding,"  alternative  uses,  and  so  forth,  (See  Reference  4 
for  a  discussion  of  this  problem.)  It  can  readily  be  seen  that  inferences  on 
reliability  made  from  data  recorded  for  other  purposes  are,  at  bast,  of 
limited  value  and  in  many  caaes  misleading. 

In  general,  the  measurement  of  reliability  should  be  absolute,  rather 
than  relative.  It  is  of  some  value  to  know  that  aggregate  "A"  had  tvd.ee  as 
many  malfunctions  as  aggregate  "B",  but  very  little  manipulation  can  be  per¬ 
formed  fron  such  data.  The  significance  of  changes  in  relative  reliability 
is  difficult,  if  not  ij^oeeible,  to  pereeive. 

There  are  many  typee  of  operating  environments  to  which  an  aggregate  is 
exposed  during  its  life  span:  transportation;  storage;  turn-on  (off);  grand 
operation;  in-flight  operation;  and  maintenance.  (See  Reference  2)  A  reliability 


P-1672 

4-21-59 

-3- 


uuumnt  iyita  nut  be  find. bit  enough  to  porait  recording  of  life  events 
la  toms  of  tho  operating  «Tlranwtti  which  sigiiflcantly  affect  a  particular 
aggregate.  Tho  fleod-hility  should  work  la  aaothor  direction  alto,  and  pomit 
dropping  fro*  tho  data  rooordlag  ayoten  thooo  environments  which  aro  found 
to  bo  of  llttlo  or  no  eigiifioance  la  tho  reliability  hiotory  of  a  particular 
typo  of  aggregate.  Find  bill ty  in  thooo  two  dlroetlono  laplloo  that  tho 
reliability  measurement  ay a ton  ahould  porait  addition  to  tho  data  rooording 
of  environments  which  aro  to  bo  tootod  for  significance.  Finally,  tho  system 
ahould  porait  faollo  ool action  and  revision  of  tho  criteria  of  ouecoaaful 
porforaaneo  of  tho  aggregate. 

Reliability  faotora  aro  highly  interdependent  with  oporationa,  nalntan- 
aneo,  and  oupply  activities  alaee  it  la  tho  axorciae  of  tho  equipment  la  thooo 
environments  that  soposes  tho  oquijaaant  to  tho  occurrence  of  malfunctions. 
Therefore,  a  reliability  neaeuranant  ayaten  a at  up  aeparately  and  In  parallel 
to  odLoting  oporationa,  maintenance,  and  supply  oyateau  will  roault  In  a  high 
degree  of  redundancy  In  data  collecting  and  proceaaing.  Coat  eonoiderationa 
dictate  that  every  consideration  should  bo  given  to  designing  tho  reliability 
■oaoureaiont  aye ten  to  bo  consolidated  with  tho  data  eyetae(e)  which  record(e) 
tho  oporationa,  aaintenanco,  and  aupply  actions  related  to  the  equipments. 

(See  Reference  5  for  a  aero  detailed  discussion  of  those  considerations.) 

Data  eye teas  in  general  should  adhere  to  tho  principle  of  simplicity  of 
data  input.  This  la  particularly  true  of  an  operational  reliability  aoaeure- 
aant  eystaa  which  usually  requires  that  data  alenenta  be  generated  at  a  largo 
nuabor  of  operating  locations  by  aany  individuals  at  different  tines.  Insofar 
aa  feasible  tho  operations  of  referencing,  classification,  arithmetic,  sun- 
aarlaation  and  analysis  ahould  not  bo  perfomed  by  tho  agent  who  originates 


P-1672 

4-21-59 

-4— 


data  elements.  As  opposed  to  the  situation  in  th#  development  and  tasting 
phase,  tha  reliability  history  of  an  equipment  in  its  operational  phase  is 
observed  by  individuals  who  are  not  trained  or  oriented  in  scientific  methods; 
who  do  not  appreciate  the  need  far  complete  and  accurate  data  which  will  be 
used  elsewhere  for  analysis  end  planning,  further,  it  is  usually  mere 
efficient  to  forward  data  elements  from  several  operating  sites,  batch  than, 
and  have  the  above  mentioned  operations  performed  by  spedaliate. 

Before  being  accepted  for  further  processing,  the  data  should  be  checked 
for  accuracy  and  completeness .  The  lattsr  problem  deserves  special  note, 
since  it  includes  checking  for  data  which  has  not  been  submitted.  The  mag¬ 
nitude  of  this  problem  can  be  judged  from  reported  experience  by  Western 
Electric,  ARINC  Research  Corporation,  and  many  other  organisations,  on 
reliability  reports  from  unmonitored  operational  points.  The  experience  is 
that  an  information  loss  of  70  per  cent  or  more  occurs  under  circuutenoes 
where  no  operating  site  data  discipline  is  encountered. 

Two  of  the  most  important  alternatives  in  estimating  equipment  reliability 
are  (l)  analysis  of  the  reliability  history  of  relativaly  largo  aggregates 
of  equipment:  missile,  ground  guidance;  ground  checking,  etc.,  (2)  analysis 
of  the  reliability  history  of  relatively  email  aggregates  of  equipment:  tubes; 
resistors;  gyros;  etc.,  combined  with  synthesis  to  obtain  estimates  of  the 
reliability  of  the  larger  aggregates.  Baoh  of  these  approaches  results  in 
vastly  different  reliability  measurement  systems •  Approach  number  (l) 
results  in  a  systma  which  monitors  the  operational  history  of  e  relatively 
few  major  individual  complex  aggregates.  Approach  ntmber  (2)  rest!  ts  in  a 
system  which  monitors  the  operational  history  of  a  relatively  large  number 
of  individual  sii^ls  sub-aggregates. 


P-1672 

4-21-59 

-5- 


The  iMoid  appro*  oh  appears  to  be  desirahla  during  the  dev  elope  eat  end 
initial  operation  of  equipmmite,  when  the  total  number  of  equip&ente  in  the 
inventory  ie  nail,  and  therefore  the  total  nuaber  of  individual  sub-aggregates 
ie  not  too  great.  At  thie  tine  in  the  life  open  of  the  equipment,  moreover, 
monitoring  of  the  eub-aggregateo  ie  extremely  deeirable,  ao  that  the  redeeign 
that  la  usually  necessary  to  achieve  fully  operational  status  can  be  bated 
on  the  Measured  reliability  behavior  of  sub-aggregates .  This  also  results 
in  better  predictions  of  the  ultimate  system  reliability. 

However,  as  the  equipment  reliability  and  perfornanee  improves,  redesign 
is  (or  should  be)  terminated  or  deferred,  and  aa  the  operational  inventory 
increases,  the  numbers  of  individual  sub-aggregatea  b scones  very  large,  and 
the  complete  monitoring  of  these  would  Impose  a  severe  data  burden  on  operat¬ 
ing  personnel.  In  addition  to  which,  since  extensive  redesign  during  thie 
phase  of  equi|»ent  usage  is  undesirable,  e  complete  reliability  history  of 
all  the  sub-aggregatea  is  unnecessary  —  the  reliability  history  needed  for 
follow-on  provisioning,  etc.,  can  well  come  from  a  sampling  of  sub-aggregates 
rather  than  complete  monitoring. 

Also,  during  this  period  emphasis  shifts  to  complete  monitoring  of  the 
reliability  history  of  the  large  aggregates  of  equipment,  since  the  eeqthaeie 
ie  now  on  the  employment  of  the  equipment  rather  than  its  development. 

Planning  shifts  to  alternative  employment  and  support  policies  for  aggregates, 
rather  than  redesign  within  aggregates. 

An  approach  to  this  problem  that  merits  investigation  sen  be  briefly 
described  aa:  (1)  Define  several  "standard*  operating  environments  for  large 
aggregates;  (2)  Describe  stresses  on  each  sub-aggregate  of  interest  for  each 
■standard"  environment;  (3)  Reoerd  only  the  reliability  history  of  aggregates 


P-1672 

4-21-59 

-6- 


in  term*  of  standard  snvironasnt*  and  exceptions,  (4)  Corralats  thasa  osti- 
aataa  to  "ban eh*  rail ability. 

Finally,  let  us  aata  sane  of  the  difficulties  that  faea  a  rail  ability 
moasurmeent  system.  Che  of  the  nost  vsod ng  preblmu  is  the  ol  as  si  float!  an 
of  nalfunetiMis  in  toms  of  degradation  of  the  effectiveness  of  the  equipment. 
A  malfunction  can  bo  defined  (Reference  1)  as  a  state  of  equipment  per¬ 
formance  idiieh  produces  operator  dissatisfaction.  It  is  apparent  that  there 
are  degreeo  of  dissatisfaction,  ranging,  say,  from  mild  annoyance  to  complete 
disgust.  However,  we  cannot  ignore  the  human  factor  in  the  definition  of  a 
malfunction,  since  each  instance  ef  dissatisfaction  generates  a  maintenance 
action.  Sven  those  maintenance  actions  which  conclude  that  no  "real"  mal¬ 
function  existed  result  in  the  consumption  ef  main ten snce  reeources:  skilled 
manpower,  test  eouipment,  parts  (used  in  the  testing  process),  and  test 
facilities.  Of  course,  many  malfunctions  can  bo  more  objectively  defined, 
end  then  operator  dissatisfaction  has  correspondingly  less  of  a  subjective 
component  to  it. 

Another  factor  which  poses  a  problem  to  the  reliability  measurement  sys¬ 
tem  is  that  of  tho  fragmentation  and  recomposition  of  aggregates.  As  parts 
and  major  assmablies  are  received,  repaired,  operated,  and  replaced,  the 
composition  of  the  individual  aggregate  changes  over  time,  so  that  knowledge 
of  tho  time  history  of  configuration  must  bo  made  available  to  tho  consumer 
of  tho  dots  collected  in  the  reliability  maaeursmant  system.  Now,  some  degree 
of  configuration  knowledge  most  be  maintained  continuously  for  other  purposes, 
such  as  operations  p leaning;  therefore,  it  does  not  seem  to  bo  deoirable  to 
ootablieh  s  separate  data  requirement  on  this  for  reliability  purposes  only, 
but  to  consolidate  the  perhaps  more  detailed  reliability  messurmesnt  needs 


P-1672 

4-215-9 

-7- 


with  opsrai  onal  uuu.  amt  need*  and  have  iteh  eouuaer  of  th«  data  draw 
upon  the  mu  file. 

Another  pro bias  of  importance  la  tha  magnitude  of  tha  data  processing 
workload.  Tha  application  of  sampling  methods  to  tha  measuring  of  reliability 
of  equipment  populations  can  allaviata  thia  to  some  degree .  In  tha  past, 
thara  has  baan  such  a  paucity  of  data  on  aqulpamt  reliability  that  additional 
data  always  saaaad  to  ba  daairabla.  However,  due  partially  to  tha  implica¬ 
tions  of  present  and  near  future  high  volume,  high  spaed  data  processing 
aqulpamt,  data  systems  planning  has  reached  tha  paint  where  there  is  a 
possibility  of  collecting  too  ouch  data,  including  reliability  data.  The 
concepts  of  statistical  decision  theory  (Reference  6)  are  vary  appropriate 
to  this  problem  in  considering  tha  oost  and  value  of  information  in  deter¬ 
mining  the  extent  of  the  sampling  effort  required. 


P-1672 

4-21-59 

-8- 


MEHBga 


1.  soott,  ».a.  aLJza&m  Boiiabinty  MaanrfflinV  Vo1-  1>  abinc 

Research  Corporation,  101-6-129,  12  December  1958,  pp.  5-9. 

2.  3 toiler,  David  9.  "A  Failure  Model  for  Iquipments  Undergoing  Complex 
Operation,'*  Operations  Research,  Vol»e  6,  1958,  pp.  723-728. 

3.  Landers,  Biohard  B.  Methods  for  Measuring..  jngjTnlnr-  Predicting 
Reliability  and  Performance  of  Laras.  eonBlfxTloctrpnlc  Crete— .  The 
Qeneral  dec  trie  Ccnpaoy,  Syracuse,  Mew  Tork  (undated;. 

4.  Brown,  Bernice  B.  Characterise*  gg  Dggd  Aircraft  Spare  Parts, 

The  BAMD  Corporation  Report  R-292,  July  1956. 

5.  S toller,  David  S.  and  Tan  Horn,  Richard  L.  Dealer  of  a  Hanarencot 
fnfor»etion  System.  The  BAMD  Corporation  Paper  P-1362,  May  lo,  1958. 


6.  Wald,  Abraham.  Statistical  Decision  Functions.  Wiley,  1950 


